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This paper proposes a hardware mechanism for reducing coherency overhead occurring in 
scientific computations within DSM systems. A first phase aims at detecting, in the address 
space regular patterns (called streams) of coherency events (such as requests for 
exclusive, shared or invalidation). Once a stream is detected at a loop level, regularity of 
data access can be exploited at the loop level (spatial locality) but also between loops 
(temporal locality). We present a hardwa ... 

Micro -archite cture techniq ues in t he I ntel® E88 70 scala b l e memor y controller 
Faye Briggs, Suresh Chittor, Kai Cheng 
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conjunction with the 31st international symposium on computer 
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Full text available: ^ pdf( 435.63 KB) Additional Information: fu l l citation , abst ract, re ferences , index terms 

This paper describes several selected micro-architectural tradeoffs and optimizations for 
the scalable memory controller of the Intel E8870 chipset architecture. The Intel E8870 
chipset architecture supports scalable coherent multiprocessor systems using 2 to 16 
processors, and a point-to-point Scalability Port (SP) Protocol. The scalable memory 
controller micro-architecture applies a number of micro-architecture techniques to reduce 
the local & remote idle and loaded latencies. The performance ... 



Keywords: distributed coherency, memory latency, scalability, transaction flows 



3 Parallel architectures: Inferential queuein g and speculative push for reducin g critical Q 
communication latencies 
Ravi Rajwar, Alain Kagi, James R. Goodman 

June 2003 Proceedings of the 17th annual international conference on 




http://portal.acm.org/results.cfm?CFID=2641521&CFTOKEN=66831360&adv=l&COLL= 



7/31/06 



Results (page 1): H-speculative +read +request, +coherency, +MESI, +buffer +full, -f-cache... Page 2 of 3 



Supercomputing 

Publisher: ACM Press 

Full text available: ^ pdf(568.93 KB) Additional Information: full citation, abstract, references , index terms 

Conamunication latencies within critical sections constitute a major bottleneck In some 
classes of emerging parallel workloads. In this paper, we argue for the use of Inferentially 
Queued Locks (IQLs) [31], not just for efficient synchronization but also for reducing 
communication latencies, and we propose a novel mechanism, Speculative Push (SP), 
aimed at reducing these communication latencies. With IQLs, the processor infers the 
existence, and limits, of a critical section from the use of synch ... 
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Serialization of threads due to critical sections is a fundamental bottleneck to achieving 
high performance in multithreaded programs. Dynamically, such serialization may be 
unnecessary because these critical sections could have safely executed concurrently 
without locks. Current processors cannot fully exploit such parallelism because they do not 
have mechanisms to dynamically detect such false inter-thread dependences. We propose 
Speculative Lock Elision (SLE), a novel micro-architectura ... 

Piranha: a scalable architecture based on single-chip multiprocessing 

Luiz Andre Barroso, Kourosh Gharachorloo, Robert McNamara, Andreas Nowatzyk, Shaz 
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The microprocessor industry is currently struggling with higher development costs and 
longer design times that arise from exceedingly complex processors that are pushing the 
limits of instruction-level parallelism. Meanwhile, such designs are especially ill suited for 
important commercial applications, such as on-line transaction processing (OLTP), which 
suffer from large memory stall times and exhibit little instruction-level parallelism. Given 
that commercial applications constitute by fa ... 

P e rformance of database wor kloads on shared-me m ory sys tems with out-of- ord er 
processors 

Parthasarathy Ranganathan, Kourosh Gharachorloo, Sarita V. Adve, Luiz Andre Barroso 
October 1998 ACM SIGPLAN Notices , ACM SIGOPS Operating Systems Review , 

Proceedings of the eightli international conference on Architectural 
support for programming languages and operating systems ASPLOS- 
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Database applications such as online transaction processing (OLTP) and decision support 
systems (DSS) constitute the largest and fastest-growing segment of the market for 
multiprocessor servers. However, most current system designs have been optimized to 
perform well on scientific and engineering workloads. Given the radically different behavior 
of database workloads (especially OLTP), it is important to re-evaluate key system design 
decisions in the context of this important class of applicatio ... 



7 Interconnect-Aware Coherence Protocols for Chip Multiprocessors 

Liqun Cheng, Naveen Muralimanohar, Karthik Ramani, Rajeev Balasubramonian, John B. 
Carter 

June 2006 Proceedings of the 33rd International Symposium on Computer 
Arcliitecture ISCA '06 

Publisher: IEEE Computer Society 

Full text available: Q pdf(367.59 KB) Additional Information: fuii citation , abstract 

Improvements in semiconductor technology have made it possible to Include multiple 
processor cores on a single die. Chip Multi-Processors (CMP) are an attractive choice for 
future billion transistor architectures due to their low design complexity, high clock 
frequency, and high throughput. In a typical CMP architecture, the L2 cache is shared by 
multiple cores and data coherence is maintained among private Lis. Coherence operations 
entail frequent communication over global on-chip wires. In fut ... 
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